# Multilingual OCR
PP OCRv4 Mobile Det
Apache-2.0
PP-OCRv4_mobile_det is an efficient text detection model optimized for mobile devices developed by the PaddleOCR team, suitable for deployment on edge devices.
Text Recognition Supports Multiple Languages
P
PaddlePaddle
360
0
PP OCRv5 Mobile Rec
Apache-2.0
PP-OCRv5_mobile_rec is the latest generation of text line recognition model developed by the PaddleOCR team. It supports the recognition of four languages: Simplified Chinese, Traditional Chinese, English, and Japanese, and is suitable for various complex text scenarios.
Text Recognition Supports Multiple Languages
P
PaddlePaddle
499
0
PP OCRv5 Server Rec
Apache-2.0
PP-OCRv5_server_rec is the latest generation of text line recognition model developed by the PaddleOCR team, supporting the recognition of multilingual and complex text scenarios.
Text Recognition Supports Multiple Languages
P
PaddlePaddle
8,601
0
Florence Base Mixed Line Bbox Ocr
MIT
An image-to-text model fine-tuned based on Microsoft Florence-2 foundation model, supporting Swedish and English, specializing in historical handwritten text recognition and optical character recognition.
Image-to-Text
Safetensors
F
nazounoryuu
112
0
Mistral Small 1
MIT
An image text-to-text model built on Mistral-Small-3.1-24B-Instruct-2503, supporting multilingual processing
Image-to-Text
Safetensors Supports Multiple Languages
M
CreitinGameplays
109
1
Internvl3 2B AWQ
Other
InternVL3-2B is an advanced Multimodal Large Language Model (MLLM) developed by OpenGVLab, featuring exceptional multimodal perception and reasoning capabilities, supporting tool usage, GUI agents, industrial image analysis, 3D visual perception, and more.

I
OpenGVLab
677
1
Paligemma2 3b Mix 224 Jax
PaliGemma 2 is an upgraded vision-language model based on Gemma 2, supporting multilingual image-text input and text output, specifically designed for vision-language tasks
Text-to-Image
P
google
38
1
Minicpm O 2 6 Int4
The int4 quantized version of MiniCPM-o 2.6, significantly reducing GPU VRAM usage while supporting multimodal processing capabilities.
Text-to-Audio
Transformers Other

M
openbmb
4,249
42
Paligemma2 28b Mix 224
PaliGemma 2 is an upgraded vision-language model launched by Google, combining the capabilities of Gemma 2 and SigLIP vision models, supporting multilingual image-text interaction tasks.
Image-to-Text
Transformers

P
google
2,050
4
Paligemma2 28b Mix 448
PaliGemma 2 is a vision-language model based on Gemma 2, supporting image+text input and text output, suitable for various vision-language tasks.
Image-to-Text
Transformers

P
google
198
26
Paligemma2 10b Mix 224
PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text input to generate text output, suitable for various vision-language tasks.
Image-to-Text
Transformers

P
google
701
7
Paligemma2 3b Mix 448
PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text inputs with text generation output, suitable for various vision-language tasks.
Image-to-Text
Transformers

P
google
20.55k
44
Trocr Nepali
A Devanagari optical character recognition model based on the TrOCR architecture, specifically fine-tuned for Nepali/Devanagari script
Text Recognition
Transformers Other

T
syubraj
175
0
Thai Trocr
Apache-2.0
A Thai and English optical character recognition model fine-tuned from the TrOCR base handwriting model, excelling in processing handwritten text line images
Text Recognition
Transformers Supports Multiple Languages

T
openthaigpt
2,677
9
Urdu Ocr
This model is specifically trained for Urdu OCR tasks and is most suitable for processing single-line Urdu text images, primarily focusing on printed text.
Text Recognition
Transformers Other

U
cxfajar197
114
1
Trocr Medieval Cursiva
MIT
This is a TrOCR-based medieval cursive script recognition model, specifically designed for identifying handwritten texts in Latin, French, Italian, Spanish, and Catalan from the medieval period.
Text Recognition
Transformers Supports Multiple Languages

T
medieval-data
18
1
Trocr Base Ru
Apache-2.0
TrOCR-Ru is an optical character recognition model fine-tuned on synthetic datasets of Russian and English, based on microsoft/trocr-base-handwritten, focusing on image-to-text tasks.
Text Recognition
Transformers Supports Multiple Languages

T
sherstpasha99
30
0
Trocr Base Finetune Numbers
TrOCR is a Transformer-based optical character recognition model designed to extract text content from images.
Image-to-Text
Transformers English

T
ANANDHU-SCT
23
0
Trocr Base Ckb
An OCR system based on Transformer architecture, specifically designed for recognizing Central Kurdish text, trained using synthetic data.
Text Recognition
Transformers

T
razhan
19
0
Pix2struct Ocrvqa Base
Apache-2.0
Pix2Struct is a visual question answering model fine-tuned for OCR-VQA tasks, capable of parsing textual content in images and answering questions
Image-to-Text
Transformers Supports Multiple Languages

P
google
38
1
Pix2struct Docvqa Base
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained on image-text pairs, supporting various tasks including image captioning and visual question answering.
Image-to-Text
Transformers Supports Multiple Languages

P
google
8,601
37
Pix2struct Chartqa Base
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained on image-text pairs for multitasking, specifically fine-tuned for chart question answering tasks
Text-to-Image
Transformers Supports Multiple Languages

P
google
181
8
Donut Base Finetuned Latvian Receipts
MIT
This model is a fine-tuned version of donut-base on a Latvian receipt dataset, primarily used for receipt image processing tasks
Text Recognition
Transformers

D
Inesence
31
0
Doctr Torch Crnn Mobilenet V3 Large French
An optical character recognition (OCR) model based on TensorFlow 2 and PyTorch, supporting multilingual text detection and recognition
Text Recognition
Transformers Supports Multiple Languages

D
Felix92
33
3
Doctr Tf Crnn Vgg16 Bn French
Optical Character Recognition technology based on TensorFlow 2 and PyTorch, supporting multilingual document recognition
Text Recognition
Transformers Supports Multiple Languages

D
Felix92
16
0
Featured Recommended AI Models